adds count num sequences and tokens metric #346

mosheraboh · 2024-03-20T14:09:13Z

No description provided.

SagiPolaczek

LGTM, Thanks!

I added questions + typing mistakes + artifacts inline.
Doesn't change the logic.

SagiPolaczek · 2024-03-20T16:04:43Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

 import torch
 import numpy as np

 from fuse.eval.metrics.metrics_common import MetricPerBatchDefault


+class MetricCountSeqAndTokens(MetricPerBatchDefault):


General question:
I'm not sure counting the sequences and tokens should be defined as metric. I don't have another suggestion it's just sounds weird :)

What do you think of that?

It uses the metric mechanism, and it's ok to me that it just counts some stats.

SagiPolaczek · 2024-03-20T16:06:14Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ ) -> None:
+ """
+ :param encoder_input: key to the encoder_input
+ :param ignore_index: token_id to ignore (not to count), typically pad token id


I think it should be able to support a list of token ids to ignore. Unless you want to enforce the user to ignore only the PAD one.

I went with just one to be more efficient - typically we would like to just skip the padding.

SagiPolaczek · 2024-03-20T16:07:49Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ :param kwargs: additional super class arguments
+ """
+ super().__init__(
+ seq_num="seq_num", # collect log_probs - output of _count_seq_and_tokens_update


obsolete comments in this line and the following one

SagiPolaczek · 2024-03-20T16:25:55Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+
+def _count_seq_and_tokens_update(
+ batch_dict: dict,
+ encoder_input_key: str,


encoder_input_key: Union[str, None]

or

encoder_input_key: Optional[str]

It's. a must. Why optional?

SagiPolaczek · 2024-03-20T16:48:11Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+
+ def _count_seq_and_tokens_compute(
+ self,
+ seq_num: List[np.ndarray],


seq_num will be a numpy array such that each entry represents a batch? If so, how often the metrics being calculate? each epoch?

I forgot these :)

each sub epoch and each entry is a batch.

SagiPolaczek · 2024-03-20T16:50:03Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ self,
+ seq_num: List[np.ndarray],
+ token_num: List[np.ndarray],
+ ) -> float:


returns a dict

SagiPolaczek · 2024-03-20T16:52:16Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ batch_dict: dict,
+ encoder_input_key: str,
+ ignore_index: Optional[int] = None,
+) -> Tuple[torch.Tensor, torch.Tensor]:


-> dict[str, Tensor]

SagiPolaczek · 2024-03-20T17:09:02Z

Last comment, did you try to write a test for it? So we'll have it covered

If time not permits maybe as a card on monday and we'll get to it later

mosheraboh

Thanks for the review and useful comments @SagiPolaczek

mosheraboh · 2024-03-21T08:32:38Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ ) -> None:
+ """
+ :param encoder_input: key to the encoder_input
+ :param ignore_index: token_id to ignore (not to count), typically pad token id


I went with just one to be more efficient - typically we would like to just skip the padding.

mosheraboh · 2024-03-21T08:32:54Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ :param kwargs: additional super class arguments
+ """
+ super().__init__(
+ seq_num="seq_num", # collect log_probs - output of _count_seq_and_tokens_update


mosheraboh · 2024-03-21T08:33:43Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+
+ def _count_seq_and_tokens_compute(
+ self,
+ seq_num: List[np.ndarray],


each sub epoch and each entry is a batch.

mosheraboh · 2024-03-21T08:34:21Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+
+def _count_seq_and_tokens_update(
+ batch_dict: dict,
+ encoder_input_key: str,


It's. a must. Why optional?

mosheraboh · 2024-03-21T08:34:34Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ batch_dict: dict,
+ encoder_input_key: str,
+ ignore_index: Optional[int] = None,
+) -> Tuple[torch.Tensor, torch.Tensor]:


mosheraboh · 2024-03-21T08:35:21Z

fuse/eval/metrics/sequence_gen/metrics_seq_gen_common.py

+ self,
+ seq_num: List[np.ndarray],
+ token_num: List[np.ndarray],
+ ) -> float:


SagiPolaczek

LGTM

SagiPolaczek · 2024-03-21T18:41:59Z

Merging it to match inner-source code.

adds count num seuquences and tokens metric

3e90d27

mosheraboh requested review from SagiPolaczek and YoelShoshan March 20, 2024 14:09

SagiPolaczek previously approved these changes Mar 20, 2024

View reviewed changes

mosheraboh commented Mar 21, 2024

View reviewed changes

review

cbe36cb

mosheraboh dismissed SagiPolaczek’s stale review via cbe36cb March 21, 2024 10:12

mosheraboh requested a review from SagiPolaczek March 21, 2024 10:56

SagiPolaczek approved these changes Mar 21, 2024

View reviewed changes

SagiPolaczek merged commit 9dc0639 into master Mar 21, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds count num sequences and tokens metric #346

adds count num sequences and tokens metric #346

mosheraboh commented Mar 20, 2024

SagiPolaczek left a comment

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek Mar 20, 2024

mosheraboh Mar 21, 2024

SagiPolaczek commented Mar 20, 2024 •

edited

Loading

mosheraboh left a comment

mosheraboh Mar 21, 2024

mosheraboh Mar 21, 2024

mosheraboh Mar 21, 2024

mosheraboh Mar 21, 2024

mosheraboh Mar 21, 2024

mosheraboh Mar 21, 2024

SagiPolaczek left a comment

SagiPolaczek commented Mar 21, 2024

adds count num sequences and tokens metric #346

adds count num sequences and tokens metric #346

Conversation

mosheraboh commented Mar 20, 2024

SagiPolaczek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SagiPolaczek commented Mar 20, 2024 • edited Loading

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SagiPolaczek left a comment

Choose a reason for hiding this comment

SagiPolaczek commented Mar 21, 2024

SagiPolaczek commented Mar 20, 2024 •

edited

Loading